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Description of the minimizers of least squares regularized with £o-iio>'i^- 
Uniqueness of the global minimizer 

Mila Nikolova* 



Abstract. We have an M x N real-valued arbitrary matrix A (e.g. a dictionary) with M < N and data d 
describing the sought-after object with the help of A. This work provides an in-depth analysis of 
the (local and global) minimizers of an objective function J^d combining a quadratic data-fidelity term 
and an £o penalty applied to each entry of the sought-after solution, weighted by a regularization 
parameter /3 > 0. For several decades, this objective has attracted a ceaseless effort to conceive 
algorithms approaching a good minimizer. Our theoretical contributions, summarized below, shed 
new light on the existing algorithms and can help the conception of innovative numerical schemes. 
^~>, To solve the normal equation associated with any M-row submatrix of A is equivalent to compute a 

local minimizer u oi J-d- (Local) minimizers u of J^d are strict if and only if the submatrix, composed 
of those columns of A whose indexes form the support of it, has full column rank. An outcome is 
that strict local minimizers of J^d are easily computed without knowing the value of /3. Each strict 
local minimizer is linear in data. It is proved that J-^ has global minimizers and that they are always 
strict. They are studied in more details under the (standard) assumption that rank(yl) = M < N. 
The global minimizers with M-length support are seen to be impractical. Given d, critical values /3k 
.^^ ' for any K $; M — 1 are exhibited such that if /3 > /3k, all global minimizers of J-d are K-sparse. An 

h-^ , assumption on A is adopted and proved to fail only on a closed negligible subset. Then for all data 

^^ ' d beyond a closed negligible subset, the objective Td for /?>/3k, K^M — 1, has a unique global 

r^ ' minimizer and this minimizer is K-sparse. Instructive small-size (5 x 10) numerical illustrations 

"*Ij , confirm the main theoretical results. 
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1. Introduction. Let A be an arbitrary matrix (e.g., a dictionary) such that 

A G M^^^ for M < N , 



5-H ■ 

C^ , where the positive integers M and N are fixed. Given a data vector d G M , we consider an 

objective function J^d : M — ^ M of the form 

(1) Mu) = \\Au - d\\l + f3\\u\\o , /3>0, 

||n||o = t)o-(n) , 
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where u G M.^ contains the coefficients describing the sought-after object, /3 is a regularization 
parameter, jj stands for cardinahty and cr{u) is the support of u (i.e., the set of ah i G 
{!,••• , N} for which the ith entry of u satisfies u[i] ^ 0). By an abuse of language, the 
penalty in (1) is called the ^o-norm. Define (^ : M — )• M by 

(2) m i^f ^ ° '^ * = ° 



1 if t / . 
Then ||u||o = >^<^(ti[i]) = y^ (/)(n[i]), so T^ in (1) equivalently reads 

i=l i£cr{u) 

N 

(3) J-,(u) = \\Au- dill + /3 ^ <p{u[i]) = \\Au - d\\l + /3 ^ <p{u[i]) . 

i=l iGo-(u) 

We focus on all (local and global) minimizers u of an objective J-d of the form (1): 

(4) n G M such that J-"d(u) = min /"^(u) , 

where O is an open neighborhood of u . We note that finding a global minimizer of J-'a must 
be an NP-hard computational problem [11, 40]. 

The function (j) in (2) served as a regularizer for a long time. In the context of Markov 
random fields it was used by Geman and Geman in 1984 [20] and Besag in 1986 [5] as a prior 
in MAP energies to restore labeled images. The MAP objective reads as 

(5) Td{u) = \\Au-d\\l + /3Y,HDku) , 

k 

where D^ is a finite difference operator and (j) is given by (2). This label-designed form is 
known as the Potts prior model, or as the multi-level logistic model [6, 24]. Various stochastic 
and deterministic algorithms have been considered to minimize (5). Leclerc [23] proposed in 
1989 a deterministic continuation method to restore piecewise constant images. Robini, Lachal 
and Magnin [33] introduced the stochastic continuation approach and successfully used it to 
reconstruct 3D tomographic images. Robini and Magnin refined the method and the theory in 
[34]. Very recently, Robini and Reissman [35] gave theoretical results relating the probability 
for global convergence and the computation speed. 

The problem stated in (1) and (4) — to (locally) minimize J-^ — arises when sparse solutions 
are desired. Typical application fields are signal and image processing, morphologic component 
analysis, compression, dictionary building, inverse problems, compressive sensing, machine 
learning, model selection, classification, and subset selection, among others. The original hard- 
thresholding method proposed by Donoho and Johnstone [15] amounts to^ minimizing J-"^, 
where d contains the coefficients of a signal or an image expanded in a wavelet basis (M = N). 
When M < N, various (usually strong) restrictions on ||u||o (often ||u||o is replaced by a less 

^As a reminder, if d are some noisy coefficients, the restored coefficients u minimize ||u — d||^ + /3||u||o and 
read u[i] = if \d[i]\ ^ ^/P and u[i] = d[i] otherwise. 
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irregular function) and on A (e.g., RIP-like criteria, conditions on ||A||, etc.) are needed to 
conceive numerical schemes approximating a minimizer of J^d, to establish local convergence 
and derive the asymptotic of the obtained solution. In statistics the problem has been widely 
considered for subset selection, and numerous algorithms have been designed, with limited 
theoretical production, as explained in the book by Miller [30]. More recently, Haupt and 
Nowak [22] investigate the statistical performances of the global minimizer of J-d and propose 
an iterative bound-optimization procedure. Fan and Li [17] discuss a variable splitting and 
penalty decomposition minimization technique for (1), along with other approximations of the 
^o-norm. Liu and Wu [25] mix the Iq and ii penalties, establish some asymptotic properties 
of the new estimator and use mixed integer programming aimed at global minimization. For 
model selection, Lv and Fan [27] approximate the io penalty using functions that are concave 
on M_|_ and prove a nonasymptotic nearly oracle property of the resultant estimator. Thiao, 
Dinh, and Thi [39] reformulate the problem so that an approximate solution can be found 
using difference-of-convex-functions programming. Blumensath and Davies [7] propose an 
iterative thresholding scheme to approximate a solution and prove convergence to a local 
minimizer of J-^. Lu and Zhang [26] suggest a penalty decomposition method to minimize J-^- 
Fornasier and Ward [18] propose an iterative thresholding algorithm for minimizing J^d where 
Iq is replaced by a reasonable sparsity-promoting relaxation given by (/)(i) = min{|t|, 1}; then 
convergence to a local minimizer is established. In a recent paper by Chouzenoux et al. [9], 
a mixed £2 — io regularization is considered: a slightly smoothed version of the objective 
is analyzed and a majorize-minimize subspace approach, satisfying a finite length property, 
converges to a critical point. Since the submission of our paper, image reconstruction methods 
have been designed where £0 regularization is applied to the coefficients of the expansion of 
the sought-after image in a wavelet frame [42, 14]: the provided numerical results outperform 
ii regularization for a reasonable computational cost achieved using penalty decomposition 
techniques. In a general study on the convergence of descent methods for nonconvex objectives, 
Attouch, Bolte, and Svaiter [1] apply an inexact forward-backward splitting scheme to find a 
critical point of Td- Several other references can be evoked, e.g., [31, 19]. 

Even though overlooked for several decades, the objective J-d was essentially considered 
from a numerical standpoint. The motivation naturally comes from the promising applications 
and the intrinsic difficulty of minimizing J^d- 

The goal of this work is to analyze the (local and global) mininiizers u of objectives Td of 
the form (1). 

• We 'provide detailed results on the minimization problem. 

• The uniqueness of the global minimizer of Td is examined as well. 

We do not propose an algorithm. However, our theoretical results raise salient questions about 
the existing algorithms and can help the conception of innovative numerical schemes. 
The minimization of Td in (1) might seem close to its constraint variants: 

, . given e ^ 0, minimize ||u||o subject to ||Au — (i||^ ^ £ , 

given i^ € I|vi) minimize \\Au — (i||^ subject to ||u||o ^ Ti . 

The latter problems are abundantly studied in the context of sparse recovery in different fields. 
An excellent account is given in [8], see also the book [28]. For recent achievements, we refer 
the reader to [10]. It is worth emphasizing that in general, there is no equivalence between the 
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problems stated in (6) and the minimization of Td in (1) because all of these problems are 
nonconvex. 

1.1. Main notation and definitions. We recall that if u is a (local) minimizer of Td, the 
value Fd{u) is a (local) minimum^ of J^^ reached at (possibly numerous) points it. Saying that 
a (local) minimizer u is strict means that there is a neighborhood O C M'^, containing u, such 
that J-d{u) < J-d{v) for any v ^ 0\ {u}. So u is an isolated minimizer. 

Let K be any positive integer. The expression ju G M : u satisfying property 'Pj 
designates the subset of M*^ formed from all elements u that meet '^. The identity operator 
on M is denoted by /k- The entries of a vector u € M read as u[i], for any i. The ith vector 
of the canonical basis'^ of M*^ is denoted by Cj € M*^. Given n G M'^ and p > 0, the open ball 
at u of radius p with respect to the ^p-norm for 1 ^ p ^ oo reads as 

Bp{u,p) = {v <^R^ : \\v-u\\p<p}. 
To simplify the notation, the ^2-norm is systematically denoted by 

II II — ^ II II 
We denote by Ik the totally and strictly ordered index sef^ 

(7) Ik = ({!,•••, K},<) , 

where the symbol < stands for the natural order of the positive integers. Accordingly, any 
subset OJ '^Ik inherits the property of being totally and strictly ordered. 

We shall often consider the index set In . The complement of w C I^ in In is denoted by 

w'' = In \ t^ C Im . 

Definition 1.1. For any u € M , the support cr{u) of u is defined by 

a{u) = |i e In : u[i\ / o| C In . 

If n = 0, clearly cr{u) = 0. 

The ith column in a matrix A G ]Rl^xN ig denoted by a^. It is systematically assumed that 



(8) a, / V i G In 



For a matrix A E ]^ivixN ^^^^ ^ vector u G M , with any w C In, we associate the submatrix 
A^ and the subvector u^ given by 

(9) ^c. = K[i], ••-,«<.[».]) GM^x»-, 

(10) n^1^'(n[^[l]],---,n[a;[tt^]]) G M«- , 

■^These two terms are often confused in the literature. 

^More precisely, for any i G Ik, the vector e^ G R*^ is defined by ei[i\ = 1 and ei[j] =0, V j G Ik \ {i}. 
*E.g. without strict order we have uj — {1, 2, 3} = {2, 1, 1, 3} in which case the notation in (9)-(10) below 
is ambiguous. 
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respectively, as well as the zero padding operator Z^^ : M"'^ — )• M'^ that inverts (10): 

if i a; , 



"^ 1 uj^ for the unique k such that a;[A;] = i. 

Thus for w = one finds ii0 = and u = Z0 (^0) = € M'^ . 

Using Definition 1.1 and the notation in (9)-(10), for any u G M \ {0} we have 

(12) C;; € Im and lo 5 o"(u) =^ An = A^u,^ . 

To simplify the presentation, we adopt the following definitions "': 

.^o^ («) A^ = []GM^>^°, 

^ ^ (6) rank(A0)=O. 

In order to avoid possible ambiguities^, we set 

where the superscript -^ stands for transposed. If A^ is invertible, similarly A~^ = {A^)~ . 

In the course of this work, we shall frequently refer to the constrained quadratic optimiza- 
tion problem stated next. 

Given d G M and u) Qln, problem ( V^j ) reads as: 



(14) 



min \\Au — d|p , 
subject to u[i] =0, \/ i ^ oj^ 



:r.) 



Clearly, problem ( Vuj ) always admits a solution. 

The definition below will be used to evaluate the extent of some subsets and assumptions. 

Definition 1.2. A property (an assumption) is called generic on M if it holds true on a 
dense open subset ofM. . 

As usual, a subset S C M^ is said to be negligible in M^ if there exists Z C M*^ whose 
Lebesgue measure in M is h (Z) = and S C. Z . If a property fails only on a negligible set, 
it is said to hold almost everywhere, meaning "with probability one". Definition 1.2 requires 
much more than almost everywhere . Let us explain. 

If a property holds true for all v G M*^ \S, where S ^ Z d M*^, Z is closed in M*^ and 
L (2^) = 0, then this property is generic on M . Indeed, M \ Z contains a dense open subset 
of R^. So if a property is generic on M*^, then it holds true almost everywhere on M*^. But 
the converse is false: an almost everywhere true property is not generic if the closure of its 
negligible subset has a positive measure,^ because then M*^ \ Z does not contains an open 



^Note that (a) corresponds to the zero mapping on R" and that (6) is the usual definition for the rank of 

empty matrix. 

•^In the hght of (9), AZ could also mean (^^)^- 

^ There are many examples — e.g. Z — {x £ \Q,1\ : x is rational}, then iJ'^Z) — and L^(closure(2)) = 1. 
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subset of M.^. In this sense, a generic property is stable with respect to the objects to which 
it apphes. 

The elements of a set S C M*^ where a generic property fails are highly exceptional in M.^. 
The chance that a truly random v G M — i.e., a v following a non singular probability distri- 
bution on M^ — comes across such an S can be ignored in practice. 

1.2. Content of the paper. The main result in section 2 tells us that finding a solution 
of (Vu)) for (x! C In is equivalent to computing a (local) minimizer of Td- In section 3 we 
prove that the (local) minimizers ii of 7-"^ are strict if and only if the submatrix ^^-(u) has full 
column rank. The strict minimizers of J-'d are shown to be linear in data d. The importance of 
the (M — l)-sparse strict minimizers is emphasized. The global minimizers of J-'d are studied 
in section 4. Their existence is proved. They are shown to be strict for any d and for any 
/3 > 0. More details are provided under the standard assumption that rank(^) = M < N. 
Given d £ M , critical values /3k for K € Im-i are exhibited such that all global minimizers 
of J^d are K-sparse® if /? > /5k • 

In section 5, a gentle assumption on A is shown to be generic for all M x N real matrices. 
Under this assumption, for all data d G M'^ beyond a closed negligible subset, the objective 
J-'d for /3 > /3k, K G Im-1) has a unique global minimizer and this minimizer is K-sparse. 

Small size (^4 is 5 x 10) numerical tests in section 6 illustrate the main theoretical results. 

2. All minimizers of ^a- 

2.1. Preliminary results. First, we give some basic facts on problem {Vuj) as defined 
in (14) that are needed for later use. If oj = 0, then oj'^ = In, so the unique solution of 
{Vuj) is n = 0. For an arbitrary a; C In meeting (Jcj ^ 1, (Vuj) amounts to minimizing a 
quadratic term with respect to only '^oj components of u, the remaining entries being null. 
This quadratic problem ( Q^ ) reads as 

II 1 1 2 

(15) min L4;^u - d , Jt"^ ^ 1 , ( Qw ) 

and it always admits a solution. Using the zero-padding operator Z^^ in (11), we have 



G M""^ solves ( Q^ ) and u = Z^ (n^) 



<^ 



G M^ solves (Vuj) , iuj>l 



The optimality conditions for ( Q^j ) , combined with the definition in (13) (a), give rise to the 
following equivalence, which holds true for any w C I|\| : 

(16) n G M^ solves (P^ ) 4^ u^jeR^'^ solves A^A^ v = A^^d and u = Z^, (uuj) 

Note that A^A^^v = A^d in (16) is the normal equation associated with A^^v = d. The 
remark below shows that the optimal value of {V^ ) in (14) can also be seen as an orthogonal 
projection problem. 

* As usual, a vector u is said to be K-sparsc if ||m||o ^ K. 
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Remark 1. Let r = rank(Atj) and B^^ G j^Mxr ^y^ ^^ orthonormal basis for ia.nge{A^). 
Then A^^ = B^^H^ for a uniquely defined matrix H^ € W^ ^^ with lank(H^) = r. Using (16), 
we have 

A^A^Ui^ = A^d <^ H^ H^u^ = H^ B^ d <^ H^u^ = B^ d <^ A^u^ = B^B^ d . 

In addition, Ilra.nge(Au,) — Buj^ui is the orthogonal projector onto the subspace spanned by 
the columns of A^^, see e.g. [29]. The expression above combined with (16) shows that 



u G M^ solves (V^ 



<^ 



■Ucj G M"'^ meets A^it^ = Ii^aj,ge{A^)d and u = Z^{u^ 



Obviously, Au = A^^u^^ is the orthogonal projection of d onto the basis B^. 
For a; C In, let K^ denote the vector subspace 

(17) K^ = {« G M^ : v[i] = 0, V i G o;^} . 
This notation enables problem {V^^ ) in (14) to be rewritten as 

(18) min \\Au - d\\'^ . 

The technical lemma below will be used in what follows. We emphasize that its statement 
is independent of the vector u G M^ \ {0}. 

Lemma 2.1. Lei d G M'^, /3 > 0, and n G M^ \ {0} be arbitrary. For a '^= a{u), set 



(19) p = min < 



mm u\i\ 



^e- ' 2{\\AT{Au-d)\\i + l 



> . 



Then p > 0. 

(i) For (f) as defined in (2), we have 

^GBoo(0,p) => ^(/-(tiH + ^H) = J^0(n[i]) + 5^0(7;[i]) . 

(ii) For K5- defined according to (17), J-'a satisfies 

veBoo{0,p)n{R^\Ka) => Fd{u + v) -^ Fd{u) , 

where the inequality is strict whenever a^ ^ 0. 
The proof is outlined in Appendix 8.1. 

2.2. The (local) minimizers of Td solve quadratic problems. 

It is worth emphasizing that no special assumptions on the matrix A are adopted. 
We begin with an easy but cautionary result. 

Lemma 2.2. For any d G M'^ and for all (3 > Q, F^ has a strict (local) minimum at 
'U = OgM^. 
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Proof. Using the fact that J-rf(O) = ||(i|p ^ 0, we have 

(20) J'div) = \\Av - df + f3\\v\\o = MO) + ^dW , 

(21) where TZdiv) = \\Av\\'^ - 2{v, A^d) + /3||v||o . 
Noticing that /3||t;||o ^ /3 > for v 7^ leads to 



V G B2 0, 



\{0} 



Inserting this imphcation into (20) proves the lemma. 



TZdiv) ^ -2\\v\\ \\A^d\\ +/3 >0 



n 



For any /? > and d £ M'^, the sparsest strict local minimizer of F^ reads u = 0. 
Initialization with zero of a suboptimal algorithm should generally be a bad choice. Indeed, 
experiments have shown that such an initialization can be harmful; see, e.g., [30, 7]. 

The next proposition states a result that is often evoked in this work. 

Proposition 2.3. Let d S M.^. Given an u C I|\j, let u solve problem {Vui) as formulated 
in (14). Then for any /3 > 0, the objective Fd in (1) reaches a (local) minimum at u and 

(22) a{u) C u} , 

where (t{u) is given in Definition 1.1. 

Proof. Let u solve problem [V^ ) , and let /3 > 0. The constraint in [V^^ ) entails (22). 

Consider that u 7^ 0, in which case for a = (t{u) we have 1 ^ tt'5" ^ ttw. Using the 
equivalent formulation oi {V^j) given in (17)-(18), yields 

(23) veK^ => \\A{u + v) - dW"^ ;^ \\Au - df . 

The inclusion in (22) is equivalent to uj'^ C a'^ . Let K5- be defined according to (17) as well. 
Then 

Combining the latter relation with (23) leads to 

(24) ^gK^ => \\A{u + v) - df ^ \\Au - df . 
Let p be defined as in (19) Lemma 2.1. Noticing that by (2) and (17) 

(25) vGKa => (l){v\i]) = 'ii £ a" , 
the following inequality chain is derived: 

7;GBoo(0,p)nK^ => Tdiu + v) = \\A{u + v)-df + l3"^(l){u[i]+v[i]) 

= \\A{u+v)-df + f]Y.^ (^[^]) + /3 E '^ (^[^]) 



by Lemma 2.1(i) 
by (25) 
by (24) 
by (3) 



ig(T 



i£a'^ 



^ \\Au-df + ^^4>(u\i\) 



iGo- 
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Combining the obtained implication with Lemma 2.1(ii) shows that 

Td{u + v) ^ Td{u) VwGBoo(0,p). 
If n = 0, this is a (local) minimizer of J^d by Lemma 2.2. D 

Many authors mention that initialization is paramount for the success of approximate 
algorithms minimizing J^d- In view of Proposition 2.3, if one already has a well-elaborated 
initialization, it could be enough to solve the relevant problem ( Vi^ ) . 

The statement reciprocal to Proposition 2.3 is obvious but it helps the presentation. 

Lemma 2.4. For d E M and /3 > 0, let Td have a (local) minimum at u. Then u solves 

CPa) for a = a{u). 

Proof. Let n be a (local) minimizer of J-d- Denote a = a{u). Then u solves the problem 

min \ \\Au — d\\ + /5 jl o" > subject to u\i\ = V i € (T^. 

Since tJ<5' is a constant, u solves {V^ ) ■ D 

Remark 2. By Proposition 2.3 and Lemma 2.4, solving (V^ ) for some w C I^ 
is equivalent to finding a (local) minimizer of J-d ■ 

This equivalence underlies most of the theory developed in this work. 

Corollary 2.5. For d G M and /3 > 0, let u he a (local) minimizer of Fd- Set a = a{u). 
Then 

(26) u = Z^{u^) , where u^- satisfies A^A^u^ = A^d . 

Conversely, if u ^ M.^ satisfies (26) for a = cr{u), then u is a (local) minimizer of Fd- 

Proof. By Lemma 2.4, u solves {V^) ■ The equation in (26) follows directly from (16). 
The last claim is a straightforward consequence of (16) and Proposition 2.3. D 

Remark 3. Equation (26) shows that a (local) minimizer u of Fd follows a pseudo-hard 
thresholding scheme^: the nonzero part u^ of u is the least squares solution with respect to 
the submatrix A^ and the whole data vector d is involved in its computation. Unlike the hard 
thresholding scheme in [15], unsignificant or purely noisy data entries can hardly be discarded 
from u and they threaten to pollute its nonzero part ita- See also Remark 6. 

Noisy data d should degrade ii^ and this effect is stronger if A^A^ is ill-conditioned [13]. 
The quality of the outcome critically depends on the selected (local) minimizer and on the 
pertinence of A. 

It may be interesting to evoke another consequence of Proposition 2.3: 

Remark 4- Given d € M'^, for any u Clf^, Fd has a (local) minimizer u defined by (26) 
and obeying o"(n) C oj. 

'■*In a Bayesian setting, the quadratic data fidelity term in J-^ models data corrupted with Gaussian i.i.d. 
noise. 
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3. The Strict minimizers of ^d- 

We remind, yet again, that no special assumptions on A £ ]^MxN ^^.g adopted. 

Strict minimizers of an objective function enable unambiguous solutions of inverse prob- 
lems. The definition below is useful in characterizing the strict minimizers of J-'d- 

Definition 3.1. Given a matrix A S M , for any r G Im we define Q^ os the subset of all 
r-length supports that correspond to full column rank M x r submatrices of A, i.e.. 



Oj. = s ^ C. 



icj = r = rank 



(A.) } • 



Set r^o = and define as well 



M-l 



dcf 



O = I J Q,r and ^max = i^ U r^M . 



T- = 



Definition 3.1 shows that for any r G Im) 

rank(yl) = r ^ 1 <^ f],. / and Ot = Vt^r + 1. 

3.1. How to recognize a strict minimizer of .7^^?. The theorem below gives an exhaustive 
answer to this question. 

Theorem 3.2. Given d G M'^ and f3 > 0, let u be a (local) minimizer of Fd- Define 

. def ..N 

a = a[u) . 

The following statements are equivalent .■ 

(i) The (local) minimum that Td has at u is strict; 



(ii) rank {As 



0- 



(iii) a G Omax • 
If u is a strict (local) minimizer of Td, then it reads 



(27) 



1 aT. 



Za {ua) for ua = {AiAa) Ajd 



and satisfies ^a = ||u||o ^ M • 

Proof. We break the proof into four parts. 

[(i)=^(ii)]. We recall that by the rank-nullity theorem [21, 29] 



(28) 



dimker (^5-) = 'i^o' — rank (A^) 



Let u 7^ be a strict (local) minimizer of J-'d- Assume that (ii) fails. Then (28) implies that 



(29) 



dimker (A^) ^ 1 



This part can alternatively be proven using Remark 1. 
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By Lemma 2.4, u solves ('Pa- ) • Let p read as in (19) and let K^- be defined according to (17). 
Noticing that 



(30) V (^Ka, (T / 

Lemma 2.1(i) shows that 

' ?;GBoo(0,p)nKa , 

Va G ker {A^,) 

by Lemma 2.1(i) 



Av = A^va , 



J^d{u + v) = \\Aa {ua + va) - df + ^^(j) {u[i] + v[i]) 
= p,^,_d||2 + /3^0(^[i])+/3j],/,(^[i]) 

i£cr 

by (25) J =\\Aaua-df + /3Y,(l>i^[^) 
by (3) 1 = Mu) , 



iScr 



iGcr'^ 



i.e., that ii is not a strict minimizer, which contradicts (i). Hence the assumption in (29) is 
false. Therefore (ii) holds true. 

If n = 0, then a = 0; hence A^ G M'^^° and rank (A^) = = (Jo" according to (13). 

[(ii) =^ (i)]. Let u be a minimizer of J-d that satisfies (ii). To have [t o" = is equivalent to 
u = 0. By Lemma 2.2, u is a strict minimizer. Focus on (la ^ 1. Since rank(j45-) = fto" ^ M 
and problem ( Q^^ ) in (15) is strictly convex for tj = o", its unique solution us- satisfies 

i;GM«'^\{0} => \\Aa (ua + v) - df > \\Aaua - df . 

Using (30), this is equivalent to 

(31) «GKa\{0} => \\A{u + v)-df = \\Aaiua + va)-df>\\AaUa-df = \\Au-df. 

Lemma 2.1(i), along with (25), yields 

v£B^{0,p)nKa\{0} ^ Mu + v) = \\A{u + v)-df + p^^{u[i]) 

by (31)] >\\Au-df + (3Y,<p{u[i]) 

= ^d{u) . 

Since {|(7^M^N — 1, we have a'^ ^ 0. So Lemma 2.1(ii) tells us that 

veB^{0,p)\Ka => Fd{u + v) > Fd{u) . 

Combining the last two implications proves (i). 

[(ii) =^ (iii)]- Comparing (iii) with Definitions 1.1 and 3.1 proves the claim. 
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[Equation (27)]. The proof follows from equation (26) in Corollary 2.5 wliere^^ ^J^6- is 
invertible. D 

Theorem 3.2 provides a simple rule enabling one to verify whether or not a numerical 
scheme has reached a strict (local) minimizer of J-^- 

The notations 0,., Vt and r^max are frequently used in this paper. Their interpretation 
is obvious in the light of Theorem 3.2. For any d G M'^ and for all /3 > 0, the set Omax is 
composed of the supports of all possible strict (local) minimizers of J-^, while il is is the subset 
of those that are (M — l)-sparse. 

An easy and useful corollary is presented next. 

Corollary 3.3. Let d G M . Given an arbitrary uj G ^ms.yi, l^t u solve {Vuj ) • Then 

(i) u reads as 

(32) u = Z^ (n^) , where u^ = [A[^AiJ) A^d , 

and obeys a = o"(u) C a; and a G fimax i 
(ii) for any (3 > Q, u is a strict (local) minimizer of T^; 
(iii) ii solves (Va) ■ 
Proof. Using (16), u fulfills (i) since A'^A^^ is invertible and (t{u) C cj by the constraint in 
{Vuj ) • If (J = 0, (ii) follows from Lemma 2.2. For ^a ^ 1, Ag^ is an M x '^a submatrix of 
j4^. Since rank(A^) = ftw, we have rank (^5-) = Jja and so a G r^max- By Proposition 2.3 u 
is a (local) minimizer of J-d^ and Theorem 3.2 leads to (ii). Lemma 2.4 and Corollary 3.3(ii) 
yield (iii). D 

Remark 5. One can easily compute a strict (local) minimizer u of Td without knowing the 
value of the regularization parameter /3. ,Just data d and an uj £ ^max cli"s needed. 
This consequence of Corollary 3.3 might be striking. 

Clearly, the support (t{u) of a nonstrict local minimizer u of J-'d contains some subsupports 
yielding strict (local) minimizers of J-'d- It is easy to see that among them, there is o" ^ o"('u) 
such that the corresponding d given by (27) strictly decreases the value of J^d] i-e., J^d{u) < 
Td{u). 

3.2. Every strict (local) minimizer of ^d is linear in d. Here we explore the behavior 
of the strict (local) minimizers of J-'d with respect to variations of d. An interesting sequel of 
Theorem 3.2 is presented in the following corollary. 

Corollary 3.4. For d G M'^ and /3 > 0, let u be a (local) minimizer of Fd satisfying a = 
a{u) £ Q . Define 

Na = ker {AT) c M^ . 
We have dimN^- = M — (Jct ^ 1 and 

d' G N5- =^ J^d+d' has a strict (local) minimum at u . 



^For o- = 0, (11) and (13)(a) show that (27) yields u = 0. 
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Proof. Since a G J7, the minimizer u is strict by Theorem 3.2. By d' G ker (^J) we find 
Ai[d+d') = Aid for any d' G N^. Inserting this into (27) in Theorem 3.2 yields the result. D 

All data located in the vector subspace N5- ^ {0} yield the same strict (local) minimizer 
u. 

Remark 6. If data contain noise n, it can be decomposed in a unique way as n = n-^^+n^i_ 

a 

where tin- G N5- and n^i^ G N^. The component tin- is removed (Corollary 3.4), while n^i_ 
is transformed according to (27) and added to Ua- 

We shall use the following definition. 

Definition 3.5. Let O C M'^ be an open domain. We say that U : O ^^ M'^ is a local 

minimizer function for the family of objectives ^o = {J~d '■ d G O} if for any d €z O, the 
function T^ reaches a strict (local) minimum atlA{d). 

Corollary 3.3 shows that for any d G M'^, each strict (local) minimizer of J-^ is entirely 
described by an lj G f^max via equation (32) in the same corollary. Consequently, a local 
minimizer function U is associated with every u G r^max- 

Lemma 3.6. For some arbitrarily fixed oo G rimax o-nd /3 > 0, the family of functions S^jgM 
has a linear (local) minimizer function lA : M'^ — t- M^ that reads as 

(33) "^deR^, U{d) = Z^ {U^ d) , where U^ = {A^A^y^ AI G M«'^>^^^ . 



Proof. The function lA in (33) is linear in d. From Corollary 3.3, for any /? > and for 
any d G M , Fd has a strict (local) minimum at u = lJ{d). Hence U fits Definition 3.5. D 

Thus, even if J-d has many strict (local) minimizers, each is linear in d. 
Next we exhibit a closed negligible subset of M , associated with a nonempty uj G ^raax, 
whose elements are data d leading to ||Z^(d)||o < ttw. 

Lemma 3.7. For any oj G r^maxj define the subset D^j C M by 

(34) D. '^^^^ U { 5 € M^ : (e. , {AIA^Y' ^^ 5) = } . 

Then D^ is closed in M'^ and L^ (D^) = 0. 

Proof. If cj = then D^^ = 0, hence the claim. Let '^uj '^ 1. For some i G Ij^j, set 

D<^=^f{,GM^ : {e^,{AlA^)-'Alg)=Q} 
= {5GM^ : (^A^{AlA^y\,,g) = QY 

Since rank(^<^ {AlA^y^'^ = JJo;, ker^A^ [AlA^y^'^ = {0}. Hence A^ {AlA^y^a / 0. 

Therefore D is a vector subspace of M of dimension M — 1 and so L (D) = 0. The conclu- 
sion follows from the fact that D^ in (34) is the union of {|a; subsets like D (see, e.g., [36, 16]). D 
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Proposition 3.8. For some arbitrarily fixed uj G r^max and P > 0, let U : M.^ — )■ M'^ be the 
relevant (local) minimizer function for ^j^m as given in (33) (Lemma 3.6). Let T)^ read as in 
(34). Then the function d ^ Td{U{d)) is C°° on M.^ \ D^ and 

dGM^\D^ ^ cT{U{d))=uj, 

where the set M'^ \ D^^ contains an open and dense subset of M'^ . 

Proof. The statement about M \ D^^ is a direct consequence of Lemma 3.7. 

If (jj = 0, then lA{d) = for all d G M'^, so all claims in the proposition are trivial. Consider 
that {jw ^ 1. For any i G Ij^j, the C;;[i]th component of hl{d) reads as (see Lemma 3.6) 

U^[^{d) = (ei, {AlA^y^ Aid 
The definition of D^ shows that 

d G M^ \ D^ and i G I^,^ ^ U^^^{d) + , 
whereas lAi{(£) = for all i G uj^. Consequently, 

uj G f^max and d G M^ \ D^ ^ a {U{d)) = oj . 
Then \\U{d)\\Q is constant on M^ \D^ and 

wG^nax and dGM^\D^ ^ Fd{U{d)) = \\AU{d) - df + P^uj . 
We infer from (33) that d ^ \\AU{d)-df is 0°° on M^, so d ^ J^d{U{d)) is C°° on M^ \ D^. D 

A generic property is that a local minimizer function corresponding to J-'d produces solu- 
tions .sharing the same support. The application d i— )■ Fd(U{d)) is discontinuous on the closed 
negligible subset J}^., where the support oiU{d) is shrunk. 

3.3. Strict minimizers with an M-length support. Here we explain why minimizers with 
an M-length support are useless in general. 

Proposition 3.9. Let rank(yl) = M, /3 > and for d G M'^ set 

Um = {'uGM : u is a strict (local) minimizer of Td meeting UttHo = M | . 

Put 

(35) Qm= U Ui^^^"" ■■ (e. , ^J^5> = } . 

Then M'^ \Qm contains a dense open subset of^^ and 

(iGM^\QM ^ tlUM=ttf^M and J-rf(n) = /3M V n G Um . 
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Proof. Using the notation in (34), Qm reads as 

Qm = U ^- • 

The claim on M'^ \Qm follows from Lemma 3.7. Since rank(A) = M, we have fj^M ^ 1. 
Consider that d G M'^ \ Qm- By Proposition 3.8 

d G M \ Qm and u G Qm =^ J^d has a strict (local) minimizer u obeying a{u) = a; . 

Hence u G Um- Therefore, we have a mapping b : Om — ^ Um such that u = 6(w) G Um- Using 
Lemma 2.4 and Corollary 3.3, it reads as 

b{Lo) = ZUAZ^d) . 

For {u},w) G Om X Om with w i^ u) one obtains w = 6(a;) G Um, "U = h(w) G Um and u ^ u, 
hence b is one-to-one. Conversely, for any u G Um there is cj G Om such that u = b{uj) and 
a{u) = bj (because d Qm)- It follows that b maps I^m onto Um. Therefore, f^M are Um in 
one-to-one correspondence, i.e. (JOm = HUm- 

Last, it is clear that w G f^M and d Qm lead to \\Au — dp = and Fd{u) = /3I\/1. D 

For any /3 > 0, a generic property of T^ is that it has jlj r^M strict minimizers u obeying 
\\u\\q = M and J-d{u) = /3I\/1. /i is hard to discriminate between all these minimizers. Hence 
the interest in minimizers with supports located in VI, i.e., strict (M — l)-sparse minimizers of 

4. On the global minimizers of J-d- The next proposition gives a necessary condition for a 
global minimizer oiT^- It follows directly from [32, Proposition 3.4] where^^ the regularization 
term is ||I?n||o for a full row rank matrix D. For Td in (1) with ||aj||2 = 1, V i G In, a simpler 
condition was derived later in [40, Theorem 12], using different tools. For completeness, the 
proof for a general A is outlined in Appendix 8.2. 

Proposition 4.L For d G M'^ and /3 > 0, let Td have a global minimum at u. Then^^ 

(36) i G a{u) => I u[i\ I ^ -f^ . 



Observe that the lower bound on {| u[i] | : i G 0"({i)} given in (36) is independent of d. 
This means that in general, (36) provides a pessimistic bound. 

The proof of the statement shows that (36) is met also by all (local) minimizers of J-'d sat- 
isfying 

J^d{u) ^ Td{u + pa) y p£R, V i G In • 

^^ Just set Qi — ei, P — Im and H = I^ in [32, Proposition 3.4]. 
"Recall that at ^ for all i G In by (8) and that || ■ || = || ■ II2. 
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4.1. The global minimizers of J^a are strict. 

Remark 7. Let d G M'^ and /3 > ||d|p. Then J^^ has a strict global minimum at u = 0. 
Indeed, 

v^O ^ ll^llo^l ^ Jd(0) = ||df </3^Pz;-df + /3||t;||o . 

For least-squares regularized with a more regular cf), one usually gets -u = asymptotically 
as /3 — 7- +00 but n 7^ for finite values of /3. This does not hold for J^d by Remark 7. 

Some theoretical results on the global minimizers of J-^ have been obtained [32, 22, 40, 7]. 
Surprisingly, the question about the existence of global minimizers of Fd has never been raised. 
We answer this question using the notion of asymptotically level stable functions introduced 
by Auslender [2] in 2000. As usual, 

lev {Fd, A) =^ {i; G M^ : Fd{v) ^ A} for A > mi Fd ■ 

The following definition is taken from [3, p. 94]. 

Definition 4.2. Let Fd '■ M — t- MU {+00} be lower semicontinuous and proper. Then Fd is 
said to be asymptotically level stable if for each p > 0, each bounded sequence {Xk} G K and 
each sequence {vk} G K satisfying 

(37) Vk €lev{Fd,Xk), \\vk\\ ^ +00, Vk \\vk\\~^ ^ v £ kev {{Fd) 00) , 

where {Fd)oo denotes the asymptotic (or recession) function of Fd, there exists k^ such that 

(38) Vk- pv G lev {Fd, Xk) V fc ^ fco • 



One can note that a coercive function is asymptotically level stable, since (37) is empty. 
We prove that our discontinuous noncoercive objective Fd is asymptotically level stable as 
well. 

Proposition 4.3. Let J'd : M^ -^ M 6e of the form (1). Then kej:{{Fd)oo) = kev{A) and Fd 
is asymptotically level stable. 

The proof is outlined in Appendix 8.3. 

Theorem 4.4. Let deR^ and 13 > 0. Then 

(i) the set of the global minimizers of Fd 

(39) [7 =^ i-u G M^ : u= min Fd{u) 



is nonempty; 
(ii) every u £ U is a strict minimizer of Fd, i.e., 

a{u) G 0„ 

hence \\u\\o ^ M. 
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Proof. For claim (i), we use the following statement^^, whose proof can be found in the 
monograph by Ausleneder and Teboulle [3]: 

[3, Corollary 3.4.2] Let J-^ '■ M'^ — )• MU {+oo} he asymptotically level stable with inf J^^^ > — oo. 
Then the optimal set U — as given in (39) — is nonempty . 
From Proposition 4.3, J-^ is asymptotically level stable and vniTd ^ from (1). Hence U ^ 0. 

(ii). Let u be a global minimizer of J-d- Set a = a{u). 

If n = 0, (ii) follows from Lemma 2.2. Suppose that the global minimizer u 7^ of J"(; is 
nonstrict. Then Theorem 3.2(ii) fails to hold; i.e., 

(40) dimker(A^) ^ 1 . 

Choose t^CT € ker {A^) \ {0} and set v = Z(^ (va)- Select an i £ a obeying v[i] 7^ 0. Define u by 

/ A1\ ~ dof „ .^r .-, V 

(41) U = U — Uh\—r^ . 

V[l] 

We have u[i] = and n[i] / . Set a = a (u). Then 

(42) a '^ a hence ^a ^ ^a — 1 . 

Ill'll 

From vs-—r-r G ker (^5-), using (12) and Remark 1 shows that^^ Au = A^u^ = A^u^ = Au. 
Then 

Td(u) = \\Au-df + l3^^ 

^J^d{u)-l3=\\Au-df+/3{^a-l) . 

It follows that u is not a global minimizer, hence (40) is false. Therefore rank (A^) = jl a and 
u is a strict minimizer of J-^ (Theorem 3.2). D 

One can note that if rank(^) = M, any global minimizer n of J-'d obeys J-'diu) ^ /3M . 
According to Theorem 4.4, the global minimizers of J-d are strict and their number is 
finite: this is a nice property that fails for many convex nonsmooth optimization problems. 

4.2. K-sparse global minimizers for K ^ M — 1. In order to simplify the presentation, 
in what follows we consider that 



rank(A) = M < N 



Since Td has a large number (typically equal to tt^^ivi) of strict minimizers with \\u\\o = M 
yielding the same value J-d{u) = /3M (see Proposition 3.9 and the comments given after its 
proof), it is important to be sure that the global minimizers of J-d are (M — l)-sparse. 

^*This result was originally exhibited in [4] (without the notion of asymptotically level stable functions). 
In detail we have Aii = A^u^ = A^ iu^ — v^ ^44 J = A^u^ = Agu^ — Au . 
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We introduce a notation which is used in the rest of this paper. For any K G Im-1) put 



(43) 




where O^ was set up in Definition 3.1. Theorem 3.2 gives a clear meaning of the sets J^k- 
For any d G M and any /? > 0, for any fixed K G Im-I; ^^e set Ok is composed of the supports 
of all possible K-sparse strict (local) minimizers of J-'a- 

The next propositions checks the existence of /3 > ensuring that all the global minimizers 
of J-rf are K-sparse, for some K G Im-i- 

Proposition 4.5. Let d G M'^. For any K G Im-Ij there exists /3k ^ such that if P > /3k; 
then each global minimizer u of Td satisfies 



(44) 



|n||o ^ K and o"('u) G Vt\{ . 



One can choose /3k = H^^^ — d|P where u solves {V^} ) for some uj G Ok- 

The proof is given in Appendix 8.4. The value of /3k in the statement is easy to compute, 
but in general it is not sharp^^. 

5. Uniqueness of the global minimizer of ^d- The presentation is simplified using the 
notation introduced next. Given a matrix A G M , with any w G Jl (see Definition 3.1), 
we associate the M x M matrix 11^^ that yields the orthogonal projection^^ onto the subspace 
spanned by the columns of A^ : 



(45) 



dcf 



1 aT 



llcj — Aijj {^A^A^j A^ . 



For UJ G rt, the projector in Remark 1 reads 11 



ra.ngc{Aui) 



n„ 



Checking whether a global minimizer H of J-d is unique requires us to compare its value 
J-d{u) with the values J-d{u) of the concurrent strict minimizers u. Let u be an (M — l)-sparse 

strict (local) minimizer of J-d- Then a = (y{u) G il. Using Remark 1 shows that 

Fd{u) = \\AaUa - df + P^a= lln^d- d||2 + /3tJ<T 

(46) =(i^(/M-n^)d + /3«<T. 

Let u be another (M — l)-sparse strict minimizer of J-^; set a = a[u). Then 

Td{u)-Td{u) = d^{Yi^-Yia)d + (3{ta- ^a) . 
If both u and u ^ u are global minimizers of J-d, the previous equality yields 

(47) Jrf(n) = J-d(n) ^ d^ (H^ - H^) d = -/3(tta - tja) . 

Equation (47) reveals that the uniqueness of the global minimizer of Td cannot be guaranteed 
without suitable assumptions on A and on d. 

"Clearly, Om-i = ^ . 

^"^ For /3 ^ /?K the global minimizers of J^d might be fc-sparse for fc <c; K. A sharper /3k can be obtained as 
/3k = min^jgn,^ {||yl5 — d||^ : u solve ( "P^j ) for cj G SIk} • 



\i Lo — 0, we have A^ £ 



and so li^ is an M x M null matrix. 
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5.1. A generic assumption on A. We adopt an assumption on the matrix A in Tf^ in 
order to restrict the cases when (47) takes place for some supports a ^ a obeying ft <5" = tj ^• 
H 1. The matrix A S M'^^'^, where M < N, zs such that for some given K G Im-1; 



(48) r £l[<^ and (a;,ii7) G (il^ x ^r), w / ro =^ Ui^^U^ 



Assumption HI means that the angle (or the gap) between the equidimensional subspaces 
range (^4^) and range (A^) must be nonzero [29]. For instance, if {i,j) G In x In satisfy i ^ j, 
HI implies that a^ / koj for any k G M\ {0} since ^{i} = ajof /||ai|p . 

Checking whether HI holds for a given matrix A requires a combinatorial search over all 
possible couples {w,ijj) G (O,. x fi^) satisfying tz7 7^ w, V r G Ik- This is hard to do. Instead, 
we wish to know whether or not HI is a practical limitation. Using some auxiliary claims, we 
shall show that HI fails only for a closed negligible subset of m,atrices in the space of all M x N 
real matrices. 

Lemma 5.1. Given r G Im-i and zu G ^r, define the following set of submatrices of A: 



Tiuy = \ Aoj : Lo £ Qr and 



Ho; = H^} 



Then Ti^ belongs to an (r x r)- dimensional subspace of the space of all M x r matrices. 
Proof. Using the fact that tz7 G ilr- and w G f^r; we have^^ 

(49) n^ = n^ ^ A^ = u^A^ . 

Therefore Ti^ equivalently reads 

(50) 71^ = [a^ : uj^Qr and A^ = U^A^^. 

Let A^ G "Hro- Denote the columns of A^ by a.j for i G Ir. Then (50) yields 

nroflj = o.i, V i G Ir =^ Oi £ range(j4ro), V i G Ij- • 

Hence all Oj, i G Ir, live in the r-dimensional vector subspace range (^ro)- All the columns of 
each matrix A^^ G H^ belong to range(Aro) as well. It follows that 7i^ belongs to a (closed) 
subspace of dimension r x r in the space of all M x r matrices, where r ^ M — 1. D 

More details on the submatrices of A living in Ti-oj are given next. 

Remark 8. The closed negligible subset H.^ in Lemma 5.1 is formed from all the submatrices 
of A that are column equivalent to A.^ (see [29, p. 171]), that is, 

(51) A^gV.^ ^ BPgW'' such that rank(P) = r and A^ = A^P . 
^^Using (45), as well as the fact that A„ = H^Au,, V a; G fir, one easily derives (49) since 

r, _Tj ^/ A^{AlA^y^ Al = Ilr^ r A^=Ilr^A^ ( n„ = n^n„ ^ rr -TT 

^"^ ^ A^{AlA^)~'Al = u^ ^ {A^^n^A^ ^ |n^ = n.n. ^n.-n^. 
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Observe that P has r^ unknowns that must satisfy Mr equations and that P must be invertible. 
It should be quite unhkely that such a matrix P does exist. 

This remark can help to discern whether or not structured dictionaries satisfy HI. 

Next we inspect the set of all matrices A failing assumption HI. 

Lemma 5.2. Consider the set % formed from M x N real matrices described next: 

n =^ (a e M^^^ : 3 r G Im-i, 3 {w,uj) G ^r x Qr, ^ / ^ and H^ = uA . 
Then T-L belongs to a finite union of vector subspaces in M whose Lebesgue measure in 

Proof. Let A ^T-L. Then there exist at least one integer r G Im-i and at least one pair 
{■uj,uj) G Vtr X Vtr such that w ^ uj and H^ = H^- Using Lemma 5.1, A contains (at least) 
one M X r submatrix A-^ belonging to an r x r vector subspace in the space of all M x r real 
matrices. Identifying A with an MN-length vector, its entries are included in a vector subspace 
of M'^'^ of dimension no larger than MN — 1. The claim of the lemma is straightforward. D 

We can now clarify assumption HI and show that it is really good. 

Theorem 5.3. Given an arbitrary K G Im-I; consider the set o/ M x N real matrices below 

Ak^= [Ae^^""^ : A satisfies HI for k|. 

Then ^k contains an open and dense subset in the space of all M x N real-valued matrices. 
Proof. The complement of ^k in the space of all M x N real matrices reads as 



^^ = |^GM^''^ : HI fails for ^ and K | 



It is clear that A'i^ C H , where T-L is described in Lemma 5.2. By the same lemma, A^ is 
included in a closed subset of vector subspaces in M whose Lebesgue measure in M is 
null. Consequently, ^k satisfies the statement of the theorem. D 

For any K G Im-I; HI is a generic property of all M x N real matrices meeting M < N. 
This is the meaning of Theorem 5.3 in terms of Definition 1.2. 
We can note that 

^K+i ^ ^K, V K G Im-2 • 

One can hence presume that HI is weakened as K decreases. This issue is illustrated in 
section 6. 

5.2. A generic assumption on d. A preliminary result is stated next. 
Lemma 5.4. Let (u, w) G f^K ^ ^K for uj ^ w and let HI hold for K G Im-i- Given k G M, 
define 

T/=^f{5GM^ : /(n^-n^)5 = K}. 
Then T^ is a closed subset ofW^ and L (T^) = 0. 



THE MINIMIZERS OF LEAST SQUARES REGULARIZED WITH £o-NORM 21 

Proof. Define / : M^ ^ M by f{g) = g^ {U^ - U^) g . Then 

(52) T, = {5 G M^ : fig) = k} . 
Using HI, T^ is closed in R . Set 

Q = {5GM^ : Vf{g)^0} and Q^ = M^\Q. 

Consider an arbitrary 51 G T^ Pi Q. From HI, rank(V/(5)) = 1. For simplicity, assume 
that 

By the implicit functions theorem, there are open neighborhoods Og C Q C M'^ and V C M'^^^ 
of g and giM_i, respectively, and a unique C-'^-function /ig : V — )• M with Vhg bounded, such 
that 

(53) 7 = (7nM-n7[M]) G Og and /(t) = k ^ 7i„_^ E V and 7[M] = hg{ji^_,) . 
From (52) and (53) it follows that^^ 

Og nTk = r (Og n (M^-i X {0})) , 
where V'^ is a diffeomorphism on Og given by 

V'f (7) = 7W, 1 ^ ^ < M - 1 and ^^(7) = hg{-au-.) + 7[M] . 

Since l.^ [Og n (M'^^^ x {0})) = and V^^ is bounded on Og, it follows from [37, Lemma 
7.25] that^i iJ^ [Vg n Tfc) = 0. We have thus obtained that 

(54) S<ZQ and 5 bounded => L^(5 n Tfc) = . 

Using that every open subset of M'^ can be written as a countable union^^ of cubes in M'^ 
[36, 16, 38], the result in (54) entails that L'^(T« n Q) = 0. 

Next, Q^= ker (H^ - H^). By HI, dimker (H^- H^) ^ M - 1. Hence L^(T« n Q^) = 0. 

The proof follows from the equality lJ^{T^) = L^(T« n Q) + L'^(T«, n Q^). D 

We exhibit a closed negligible subset of data in M'^ that can still meet the equality in (47). 
Proposition 5.5. For /3 > and K G Im-1; put 

K 

(55) Sk=' U U \J [g^^^"^ ■■ ^^^ and /(h^ - n^)^ = n/s} , 

"=-K weHK roe^K 

where Ok «s given in (43). Lei ifi /lo/c? /or K. Then Sk «s closed in M'^ and L'^ (^k) = 0. 

^°Prom (53), V is the restriction of Og to R'^"^ 

■^^The same result follows from the change-of- variables theorem for the Lebesgue integral, see e.g. [37]. 

^■^From (54), adjacent cubes can also intersect in our case. 
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Proof. For some n € {— K, • • • , K} and (w, zu) G (Ok x f^K) such that w / ro, put 

If (jo; / tjtz7, then rank(n(^ — 11^) ^1- If ft^ = (jro, HI guarantees that rank(ntj — 11^) ^ 1, 
yet again. The number n/3 G M is given. According to Lemma 5.4, S is a closed subset of 
M and L (S) = 0. The conchision follows from the fact that Sk is a finite union of subsets 
like S. D 

We assume hereafter that if HI holds for some K G Im-1; data d satisfy 



5.3. The unique global minimizer of Td is K-sparse for K ^ (M — 1). We are looking 
for guarantees that Td has a unique global minimizer u obeying 

||n||o ^ K for some fixed K G Im-i ■ 

This is the aim of the next theorem. 

Theorem 5.6. Given K G Im-I; ^e^ HI hold for K, /3 > /3k where /3k meets Proposition 4-5 
and Sk C M'^ reads as in (55). Consider that 

dGM^\SK . 

Then 

(i) the set M \Ti\^ is open and dense in M ; 

(ii) J^d has a unique global minimizer H, and ||u||o ^ K. 

Proof. Statement (i) follows from Proposition 5.5. 

Since (3 > I3y^, all global minimizers of J-^, have their support in f^K (Proposition 4.5). 
Using the fact that d G M \ Sk , the definition of Sk in (55) shows that 

(56) -K ^ n ^ K and (w, ro) G (Hk x Hk), a; / ro =^ d^ ili^ -Ii.^\d^ nj3 . 

The proof is conducted by contradiction. Let u and u ^ ibhe two global minimizers of J-^. 
Then 

» dcf / , s 7=r , _ dcf / _ N 7=r 

(T = (T(n) G i2K and o" = (J[u) G SiK , 
and (7 / a. By J"<i(-u) = Fd{u), (47) yields 

(57) d^(n^-n^)(i = /3(«a- tta) . 
An enumeration of all possible values of '^a — '^a shows that 

/3 ( ft o" — jj fj) = n/3 for some n G { — K, • • • , K} . 



THE MINIMIZERS OF LEAST SQUARES REGULARIZED WITH £o-NORM 23 

Inserting this equation into (57) leads to 

(f {Ua - U^) d = nf3 for some n G {-K, • • • , K} . 

The last result contradicts (56); hence it violates the assumptions HI and d G M \ Sk- Con- 
sequently, J-'d cannot have two global minimizers. Since J-'a always has global minimizers 
(Theorem 4.4(i)), it follows that J^d bas a unique global minimizer, say u. And ||u||o ^ K 
because a{u) G f^K- D 

For /? > /3k, the objective Td in (1) has a unique global minimizer and it is K-sparse for 
K ^ M — 1. For all K G Im-i, the claim holds true in a generic sense. This is the message 
of Theorem 5.6 using Definition 1.2. 

6. Numerical illustrations. 

6.1. On assumption HI. Assumption HI requires that H^ / H^ when {uj,w) G 0^, x $7^, 
Lij ^ VD for all r ^ K G Im-i- From a practical viewpoint, the magnitude of (Hj^ — H^) should 
be discernible. One way to assess the viability of HI for a matrix A and K G Im-i is to 
calculate 

(58) eK(^) ='min /..(A) , 

where p^ri.-^ = min llHoj — nro||2 ) V r G Ik • 

(cj, tu) G rj,. X rj,. 

Ijj ^ w 

In fact, ||n^ — nro||2 = sin(0), where d G [0,7r/2] is the maximum angle between range (^4^^) 
and range (^ro); see [29, p. 456]. These subspaces have the same dimension and H^^ ^ H^ 
when (a;,tu) G Or x ^n oj ^ w and r G Iki hence 9 G (0, 7r/2]. Consequently, 

HI holds for K G Im-i ^ l^M) G (0, 1] V r G Ik ^ Ck(^) G (0, 1] . 

According to (58), we have ^k ^ Ck+i? V K G Im-2- Our guess that assumption HI is lightened 
when K decreases (see the comments following the proof of Theorem 5.3) means that 

(59) 6(^)>--->^M-i(^) . 

We provide numerical tests on two subsets of real- valued random matrices for M = 5 and 
N = 10, denoted by A2Q and A^qqq. The values of ■^k(')) ^ ^ Im-i = l4> for every matrix 
in .A2Q and in ^ioqo) were calculated using an exhaustive combinatorial search. All tested 
matrices satisfy assumption HI, which confirms Theorem 5.3 and its consequences. In order 
to evaluate the extent of HI, we computed the worst and the best values of ^k(") over these 
sets: 



: worst 
(60) 



^worst ^ min^K(^) 

VKgIm-1, Ae{A^o,A%o} 
C^«^* = maxCK(^) 
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Set A2Q- This set was formed from 20 matrices A^, n € I20 of size 5 x 10. The compo- 
nents of each matrix A" were independent and uniformly drawn from the standard normal 
distribution with mean zero and variance one. The values of i^k(") are depicted in Fig. 6.1. We 
have23 ^(^^0) = 6(^^°) and 6(^^'^) = 6(^^^)- In all other cases (59) is satisfied. Fig. 6.1 
clearly shows that ^^kI") increases as K decreases (from M — 1 to 1). 



0.8 



eK(-A")<^ 



0.1 



9 



9 
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^ 



6 

I 



I 



o ^ 



I 



I 



K = 1: green o 
K = 2: red 
K = 3: blue D 
K = 4: magenta A 



i <l» O.lR^ 



z 
o 
o 
m 



A_A_A_A _L _A. 



-I _ 4- _ ^ _ ^ _ I- _|_ _|1|_ -I _ -, _ 4 - t 

I I I I HI I I I I I T 

_i_ A _A A A _A_ ^_1^^_^_4l 



Ja^ 



i^kk 



▲ A 



10 



(a) ^k(^") for the matrices in A 



17 



20 



n 



20 



(b) Zoom of (a) - y-axis 



20 



Figure 6.1. x-axis: the list of the 20 random matrices in A2o- (n) y-axis: the value ^k(^") according 
to (58) for all K G Im-i and for all n G l2o- The plot in (b) is a zoom of (a) along the y-axis. 

The worst and the best values of ^k(") over the whole set ^20 are displayed in Table 6.1. 

Table 6.1 

The worst and the best values of ^k(A), for K £ Im-i, over the set A20, see (60). 





K = l 


K = 2 


K = 3 


K = 4 


(-worst 


0.3519 


0.1467 


0.0676 


0.0072 


/- best 


0.8666 


0.5881 


0.3966 


0.0785 






-L riG SGTJ .-'T-i nnn 



was composed of one thousand 5 x 10 matrices A^, n G Iiooo- 
The entries of each matrix A^ were independent and uniformly sampled on [—1,1]. The 
obtained values for ^^o''*^ and ^^^^^^ calculated according to (60), are shown in Table 6.2. 

For K E I3, the best values of ^k(") were obtained for the same matrix, A^^^. Note that 
^4(^964) ^ 0.0425 > i'^°''^\ The worst values in Table 6.2 are smaller than those in Table 6.1, 
while the best values in Table 6.2 are larger than those in Table 6.1; one credible reason is 



that AiQQQ 



is much larger than ^^. 



This is why on the figure, in columns 10 and 17, the green "o" and the red "0" overlap. 
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Table 6.2 

The worst and the best values o/^k(^), for K G Im-i, over the set Ai 





K = l 


K = 2 


K = 3 


K = 4 


/: worst 


0.1085 


0.0235 


0.0045 


0.0001 


cbest 


0.9526 


0.8625 


0.5379 


0.1152 



Table 6.3 

Percentage of the cases in .4iooo when (59) fails to hold. 



ci{A^) = UAn 



6(A«) = 6(^") 



Cs{A^)=U{A'' 



occurrences {n} 



5% 



1.6% 



0.1 



Overall, (59) is satisfied on ^ioqo — the percentages in Table 6.3 are pretty small. All three 
tables and Figure 6.1 agree with our guess that HI is more viable for smaller values of K. 

Based on the magnitudes for ^^ in Tables 6.1 and 6.2, one can expect that there are 
some classes of matrices (random or not) that fit HI for larger values of S,y<^{-). 

6.2. On the global minimizers of .7-"^. Here we summarize the outcome of a series of 
experiments corresponding to several matrices A G M'^^'^ where M = 5 and N = 10, satisfying 
HI for K = M — 1, different original vectors u G M.^ and data samples d = Ail + noise, for 
various values of /? > 0. In each experiment, we computed the complete list of all different 
strict (local) minimizers of T^, say ('U*)^^-^. Then the sequence of values [J^diu^)) ^^^ was 



sorted in increasing order, J^rf('u*i) ^ Fd{u^'^^ ^ ••• ^ Fd{u^"^ ■ A global minimizer u^^ 
is unique provided that F^ (n*^) < F^ (n*^). In order to discard numerical errors, we also 
checked whether \Fd (^*^) — Fd (^*^) | is easy to detect. 

In all experiments we carried out, the following facts were observed: 

• The global minimizer of Fd was unique — manifestly data d never did belong to the 
closed negligible subset Sk in Proposition 5.5. This confirms Theorem 5.6. 

• The global minimizers of Fd remained unchanged under large variations of (3. 

• The necessary condition for a global minimizer in Proposition 4-1 was met. 
Next we present in detail two of these experiments where Fd is defined using 



(61) A 



7249033667 
3493391315 
5424071929 
8409604237 
6365090038 



d = Ail + n , 

where n is noise and 
ii= (O, 1, 8, 0, 3, 0, 0, 0, 0, 9)'' 



Only integers appear in (61) for better readability. We have rank(^) = M = 5. An exhaustive 
combinatorial test shows that the arbitrary matrix A in (61) satisfies HI for K = M — 1. The 
values of ■^k(^) are seen in Table 6.4. One notes that fJ'2{A) > fJ.i{A); hence ^i{A) = ^2(^)- 

One expects (at least when data are noise-free) that the global minimizer it of Fd obeys 
a C cr(u), where il is the original in (61), and that the vanished entries of u correspond to 
the least entries of il. This inclusion provides a partial way to rate the quality of the solution 
provided by a global minimizer u oi Fd- 
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Table 6.4 

The values o/^k(^) a-nd /iK(A), V K G Im-i, for the matrix A in (61). 





K = l 


K = 2 


K = 3 


K = 4 


iK{A) 


0.2737 


0.2737 


0.2008 


0.0564 


y^K{A) 


0.2737 


0.2799 


0.2008 


0.0564 



The experiments described hereafter correspond to two data samples relevant to (61) — 
without and with noise — and to several values of /3 > 0. 
Noise-free data. The noise-free data in (61) read as: 

(62) d = Au={m, 130, 101, 85, 123 )'^. 

For different values of /3, the global minimizer u is given in Table 6.5. Since (t{u) G il and 

Table 6.5 

The global minimizer u of J- a and its value J-d(u) for the noise-free data d in (62) for different values of 
p. Last row: the original il in (61). 



n 




The global minimizer 


u of Td 


(row vector) 


^ 


J'di.u) 




1 





1 8 3 





9 


4 


4 




102 





8.12 3.31 





9.33 


3 


301.52 




10^ 








12.58 


20.28 


2 


2179.3 




10^ 





29.95 








1 


14144 




7-10^ 

















58864 









8 















9 



the data are noise- free, J-fi does not have global minimizers with ||n||o = 5. Actually, applying 
Proposition 4.5 for u = u yields /3m-i = 0, hence for any /3 > all global minimizers of Td 
have a support in ri = r^M-i (see Definition 3.1 and (43)). The global minimizer u for /3 = 1 

meets u = u. For j3 = 100, the global minimizer u obeys a = a{u) = {3,5, 10} ^ cr{u) and 
\\u\\o = 3 — the least nonzero entry of the original u is canceled, which is reasonable. The 
global minimizers corresponding to /3 ^ 300 are meaningless. We could not find any positive 
value of /? giving better 2-sparse global minimizers. Recalling that data are noise-free, this 
confirms Remark 3: the global minimizers of J-d realize a only pseudo-hard thresholding. For 
/3 ^ 7 • 10^ > 1 1 dp, the global minimizer of J-^ is « = which confirms Remark 7. 



Noisy data. Now we consider noisy data in (61) for 



(63) 



n 



(4, -1, 2, 



^y 



This arbitrary noise yields a signal-to-noise ratio^^ (SNR) equal to 14.07 dB. If /3 ^ 0.04, 
J-d has 252 different strict global minimizers it obeying \\u\\o = M and J-d{u) = /3M (recall 
Proposition 3.9). For /3 ^ 0.05, the global minimizer u of J-'d is unique and satisfies ^{u) G fi. 
It is given in Table 6.6 for several values of /3 ^ 0.05. For (3 = 1, the global minimizer is 



Let us denote d = Ail and d = d + n. The SNR reads [41] SNR(d, d) 
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Table 6.6 

The global minimizer u of Fd and its value J-d{u) for noisy data given by (61) and (63), for different values 
of l3. Last row: the original il. 



a 


The 


global 1 


Tiinimizer u of T^ 


(row vector) 


""0 


T^in) 


1 


6.02 


2.66 


6.43 


6.85 





4 


4.0436 


102 





8.23 





2.3 


9.71 


3 


301.94 


103 





8.14 








10.25 


2 


2174.8 


10^ 














14.47 


1 


14473 


T-IO^ 




















60559 







meaningless. We could not find any positive value of /3 yielding a better global minimizer with 
a 4-length support. For the other values of /3, the global minimizer u meets a = cr(u) ^ o"(u), 
and its vanished entries correspond to the least entries in the original n. For (3 = 100, the 
global minimizer seems to furnish a good approximation to il. Observe that the last entry of 
the global minimizer ^[10], corresponding to the largest magnitude in il, freely increases when 
/3 increases from 10^ to 10^. We tested a tight sequence of intermediate values of /3 without 



finding better results. Yet again, /3 ^ 7 • 10 > 
(see Remark 7). 



leads to a unique null global minimizer 



4 10" 



J^dii 



f 

ti * 
I 

'3> " 



* »» 
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{«} 637 {«} 637 

(a) The values of all strict (local) minimizers of Td (^) ^°°™ °^ (^) ^^^'^S ^^'^ S^"^^^^ 

Figure 6.2. ^ZZ 638 strict (local) minima of J^d in (61) for P — 100 and data d corrupted with the arbitrary 
noise in (63). The x-axis lists all strict (local) mmimizers {u} of Fd sorted according to their to -norm \\u\\o in 
increasing order, (a) The y-axis shows the value J-d{u) of these minimizers marked with a star. The value of 
J-d for u = is not shown because it is too large (J-d{0) = 60559 = ||d|| ). (b) A zoom of (a) along the y-axis. 
It clearly shows that J-d has a very recognizable unique global minimizer 
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Figure 6.2 shows the value J-d{u) of all the strict local minimizers of J-^ for (3 = 100. In 
the zoom in Figure 6.2(b) it is easily seen that the global mininiizer is unique (remember 
Theorem 5.6). It obeys ||u||o = 3 and J-d{u) = 301.94. One observes that J-d has 252 = JJOm 
different strict local minimizers u with ||?i||o = 5 = M and J-d{u) = 500 = /3M. This confirms 
Proposition 3.9 — obviously d does not belong to the closed negligible subset Qm described in 
the proposition. 

7. Conclusions and perspectives. We provided a detailed analysis of the (local and 
global) minimizers of a regularized objective J-d composed of a quadratic data fidelity term 
and an (.q penalty weighted by a parameter /3 > 0, as given in (1). We exhibited easy necessary 
and sufficient conditions ensuring that a (local) minimizer u of J-d is strict (Theorem 3.2). 
The global minimizers of Td (whose existence was proved) were shown to be strict as well 
(Theorem 4.4). Under very mild conditions, J-d was shown to have a unique global minimizer 
(Theorem 5.6). Other interesting results were listed in the abstract. Below we pose some 
perspectives and open questions raised by this work. 

• The relationship between the value of the regularization parameter (3 and the sparsity 
of the global minimizers of J-d (Proposition 4.5) can be improved. 

• The generic linearity in data d of each strict (local) minimizer of J-d (subsection 3.2) 
should be exploited to better characterize the global minimizers of Td- 

• Is there a simple way to check whether assumption HI is satisfied by a given matrix 
A G 1^1^ xN when N and M < N are large? Remark 8 and in particular (51) could 
help to discard some nonrandom matrices. Conversely, one can ask whether there is a 
systematic way to construct matrices A that satisfy HI. 

An alternative would be to exhibit families of matrices that satisfy HI for large values 
of ?k(")) where the latter quantifiers are defined in equation (58). 

• A proper adaptation of the results to matrices A and data d with complex entries 
should not present inherent difficulties. 

• The theory developed here can be extended to MAP energies of the form evoked in (5). 
This is important for the imaging applications mentioned there. 

• Based on Corollary 2.5, and Remarks 3 and 6, and the numerical tests in subsection 
6.2, one is justified in asking for conditions ensuring that the global minimizers of J-d 
perform a valid work. Given the high quality of the numerical results provided in 
many papers (see e.g., [33, 34]), the question deserves attention. 

There exist numerous algorithms aimed at approximating a (local) minimizer of J-d- As 
a by-product of our research, we obtained simple rules to verify whether or not an algorithm 
could find 

- a (local) minimizer u of J-d — by checking whether u satisfies (26) in Corollary 2.5; 

- and whether this local minimizer is strict by testing whether the submatrix whose 
columns are indexed by the support of u (i.e., Afyi^\) has full column rank (Theo- 
rem 3.2). 

Some properties of the minimizers of J-d given in this work can be inserted in numerical 
schemes in order to quickly escape from shallow local minimizers. 

Many existing numerical methods involve a studious choice of the regularization parameter 
/3, and some of them are proved to converge to a local minimizer of J-d- We have seen that 
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finding a (strict or nonstrict) local minimizer of T^ is easy and that it is independent of 
the value of P (Corollaries 2.5 and 3.3). It is therefore obscure what meaning to attach to 
"choosing a good (3 and proving (local) convergence". Other successful algorithms are not 
guaranteed to converge to a local minimizer of J-d- Whenever algorithms do a good job, the 
choice of /3, the assumptions on A and on \\u\\q, and the iterative scheme and its initialization 
obviously provide a tool for selecting a meaningful solution by minimizing Td- There is a 
theoretical gap that needs clarification. 

The connection between the existing algorithms and the description of the minimizers 
exposed in this paper deserves deep exploration. What conditions ensure that an algorithm 
minimizing J^d yields meaningful solutions? Clearly, showing local convergence does not answer 
this important question. 

One can expect such research to give rise to innovative and more efficient algorithms 
enabling one to compute relevant solutions by minimizing the tricky objective J-'d- 



8. Appendix. 



8.1. Proof of Lemma 2.1. Since u / 0, the definition of a shows that min I u[i] I > 0. 



Then p in (19) fulfills p > 0. 
(i). Since jjo" ^ 1, we have 



iScr 



i € a , V £ Boo(0, p) 



(64) 



by (2) 



max I v[j] I < p 

jGCT 

max v[j] < min ulj] 

\u[i] +v[i]\ ^ \u[i]\ - \v[i]\ 
^ min \u[j]\ — max \v[j]\ ^ p — max \v[j]\ > 

u[i\ +v[i\ / 

(p {u[i\ + v[i\) =<p{u[i\) = 1 . 



If (j'^ = the result is proved. Let a^ ^ 0. Then u[i] = = (/)(u[i]), V i E a^. Inserting this 
and (64) into 



Y^ <l){ u[i] + vli] ) = Y(p{u\i] + v[i] ) + Y, H^\i] + ^W ) 



i&N 



t&(T 



jGo-c 



proves claim (i) 
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(ii). Using the fact that ||yl(u + v) - dp = \\Au - dp + WAvW^ + 2{Av, Au - d), one 
obtains 

veB^{0,p)\Ka => Td{u + v) = \\Au-df + \\Avf + 2{Av,Au-d) 
by Lemma 2.1(i) 



using (3) 



+/3j;,/.(ti[i])+/3X;'/'(^w) 

= J^diu) + WAvW^ + 2{Av, Au-d)+pY,^ (^W) 
^ Td{u) - \2{v, A^{Au -d))\+ Phach 



(65) 



T I 



by Holder's inequahty ^ J^d{u) — 2||f ||oo ||^ {Au 



|i +/3||f^c||o 



If ]\a'^ = 0, then K^- = M , so u G M'^ and ||f ||oo = 0; hence we have the inequahty. 

Let tj(5"'^ ^ 1. For v K5-, there at least one index i G o"*^ such that f[i] 7^ 0; hence 
11^6-= llo ^ 1- The definition of p in (19) shows that 



v€B^{0,p)\K^ 



biloo > -P ^ 



/5 



IT,.- ,,u , «„ „ . 2f3\\A'\Au-d)\U 



2||7;||ooP'(^'u-d)||i+/3||t;^c||o > 



2(\\AT{A 



u 



+ 1 



+ /3>0 . 



Introducing the last inequality into (65) shows that for \la^ ^ 1, the inequality in (ii) is strict. 

8.2. Proof of Proposition 4.1. If u = 0, the statement is obvious. We focus on u ^ 0. 
For an arbitrary i G In , define 



,",W 



def 



[U 



[I],--- ,u[i-liO,u[i + l],--- ,n[N]) G 



We shall use the equivalent formulation of J-^ given in (3). Clearly^^ 






Consider / : M — )• M as given below 

(66) /(t) = J-rf(u«+e,t^ 

Since n is a global minimizer of Td, for any z G I|\i, we have 

f {u[i\) = Fdiu^"^ + e,u[^) 

<,Fd{u^'^ + e^t)=f{t) VtG 



^^Using the definition of u'*' , we have ir-''' — A(i^ \ {i})^(in \ {*}) ' hence Av!-^' is independent of ■ 
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Equivalently, for any i G I^ , f {u[i]) is the global minimuin of f{t) on M. Below we will 
determine the global niinimizer(s) t = u[i] of / as given in (66), i.e., 

t = u[i] = argmin/(t) . 
In detail, the function / reads as 

fit) = Pn« + a,t - df + /3 ^ </. (n« [j]) + /30(t) 

= Pii« - df + Waift^ + 2t{ai, ylu« - d) + /3 ^ .^ [u^'^j]) + mt) 

(67) = \\ai\\H'^ + 2t{ai,Au'''^ -d) + l3(P{t) + C , 
where 

Note that C does not depend on t. The function / has two local minimizers denoted by to 
and ti. The first is 

(68) to = => f{io) = C. 

The other one, ti 7^ 0, corresponds to (j){t) = 1. From (67), ti solves 

2\\aift + 2{ai,Au^'^ - d) = . 
Recalling that Oj 7^ 0, V i G In (see (8)), it follows that 



(69) 



i.^-<°" '!"':'.-'" ^ /fe)=-'°"f'r'"%^+c. 



Next we check whether to 01^ ^1 is a global minimizer of /. From (68) and (69) we get 



Furthermore, 



(70) 



fiio)<fiii) => u\i]=io = 0, 



fih) < /(to) => u[i] = ti 



{a„Au^'^ -d) 



/(^o) = /(^i) =^ ^0 and ti are global minimizers of /. 
In particular, we have 



(71) 



fih) ^f (to) ^ (ai,An«-d)2^ 

by (70) - '^-■^^•1'- 1^ ^1 



ui 



by (71) 



> 



V:5||ai|| VP 



a,; 



It is clear that the conclusion holds true for any i G I|\|. 
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8.3. Proof of Proposition 4.3. The asymptotic function {Td)oo (^) of ^d can be calculated 
according to^® [3, Theorem 2.5.1] 



f ' — )• f 
t — 7- oo 



Then 



(-^rf)oo(^)= liminf 
v' ^ V 

t — )■ oo 



\Av'-df + l3\\v'\\o 



hminf (t\\Avr-2{d,Av')+\\^^l±J^Mh\ 
v' ^ V \ * / 



V ^ V 

t — )■ oo 

if w G ker(^) , 

+00 if V ^ ker(j4) . 

Hence 

(72) ker {{Td)oo) = ker(^) , 

where ker ((Jrf)oo) = {f G M^ : (Jrf)ooW = 0}. 

Let {vk} satisfy (37) with Vk ||^fc||~^ — )■ u G ker(j4). Below we compare the numbers \\vk\\o 
and \\vk — pv\\o where p > 0. There are two options. 

1. Consider that i G cr{v), that is, v[i] = lim Vk[i] \\vk\\~ 7^ 0. Then | ffcl^] | > for all 

k—^oo 

but finitely many k as otherwise, Vk[i] H^fcll"^ would converge to 0. Therefore, there 
exists ki such that 

(73) I Ufc [i] — p z; [i] I ^0 and | ffc[i] | > \/ k ^ ki . 

2. If i G {a{v)Y, i.e. v[{\ = 0, then clearly 

(74) Vk[i] - pv[i]= Vk[i\ ■ 
Combining (73) and (74), the definition of || • ||o using (j) in (2) shows that 

(75) ll^fc -p-u||o ^ ll^fcllo V /c ^ /co = max A;, . 

By (72), Av = 0. This fact, jointly with (75), entails that 

^d{vk - pv) = \\A{vk + pv) - dip + P\\vk - pv\\q 
= \\Avk -dip + I3\\vk - pv\\o 
^ \\Avk - df + PWvkh = ^d{vk) yk^ko. 

^^In the nonconvex case, the notion of asymptotic functions and the representation formula were first given 
by J.P. Dedieu [12]. 
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It follows that for any k ^ ko we have 

Vk e lev {Td, Afe) => Vk- pv e lev {Ta, Xk) , 
and thus J-d satisfies Definition 4.2. 

8.4. Proof of Proposition 4.5. Given K G Im-Ij set 

(76) Uk+1 = M {n : u solves (Vco) and ||m||o ^ K + 1} . 



def 



• Let Uk+1 7^ 0- By Proposition 2.3, for any /? > 0, J-^ has a (local) minimum at each 
u € Uk+1- Thus 

(77) u is a (local) minimizer of J^d and H-uHo^K + l <^ u £ Uk+i ■ 
Then for any /3 > 

(78) J-rf(n) ^ /3(K + 1) Vug Uk+1. 
Let u be defined by^^: 

u solves {Vuj ) for some tj G Ok ■ 
Then 

(79) ||5||o =^ K . 
Set (3 and /3k according to 

(80) /3>f3K = \\Au-df. 
For such a /3 we have 

J'diu) = \\Au - df + f^Mo 
by (79) and (80) 1 < /3 + /3K = /3(K + 1) 

by (78) ] ^ J-rf(n) V ti G Uk+i • 

Let n be a global minimizer of J^d- Then 

J"d(u) ^ Td{u) < Td{u) Vug Uk+1 • 

Using (76)-(77), we find 

||n||o ^ K . 

• Uk+1 = entails that^^ 

(81) u solves (P^) for wCIn, tt^^K + 1 ^ ||n||o ^ K . 
Let u be a global minimizer of Td- By (81) we have 

||n||o ^ K . 
According to Theorem 4.4(ii), any global minimizer of Td is strict, hence (j{u) G J^k • 

^^Such a u always exists; see subsection 1.1. By Proposition 2.3 and Theorem 3.2, it is uniquely defined. 
^*Let A = (ei,e2,e3,e4,ei) G R*^^ and d = ei G R**. For K = M - 1 = 3 one can check that Uk+i = 0. 
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